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coding theorem. The nonadditive information content we adopted is consistent with the concept of 
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I. INTRODUCTION 



The intuitive notion of what a quantitative expression for information should be has been addressed in the devel- 
opment of transmission of information which led to the information theory (IT). The IT today is considered to be the 
most fundamental field which connects other various fields such as physics (thermodynamics), electrical engineering 
(communication theory), mathematics (probability theory and statistics), computer science (Kolmogorov complexity) 
and so on EJ. Accordingly, the selection of the information measure becomes a influential matter. The introduction 
of logarithmic form of information measure dates back to Hartley Q| . He defined the practical measure of informa- 
tion as the logarithm of the number of possible symbol sequences. After that, Shannon established the logarithmic 
based IT from the reasons: (a) practical usefulness, (b) closeness to our intuitive feeling, (c) easiness of mathematical 
manipulation 

On the other hand, however, non-logarithmic form of (or nonextensive) entropy is currently considered as a useful 
measure in describing thermostatistical properties of a certain class of physical systems which entail long-range inter- 
actions, long-time memories and (multi)fractal structures. The form of the nonextensive entropy proposed by Tsallis 
Q has been intensively applied to many such systems j(| . The reason why the formalism violating the extensivity of 
the statistical mechanical entropy seems to be essential for convincing description of these systems is not sufficiently 
revealed in the present status. Nevertheless the successful application to some physical systems seems to lead us to 
investigate into the possibility of the nonadditive IT since Shannon's information entropy has the same form as the 
logarithmic statistical mechanical entropy. 

It is desirable to employ the nonadditive information content which the associated IT contains Shannon's IT in a 
special case. The concept of the form invariance to the structure of nonextensive entropies was considered to provide 
a guiding principle for a clear basis for generalizations of logarithmic entropy JtJ . This structure seems to give a hint 
at the selection of the nonadditive information content. The form invariant structure require normalization of the 
original Tsallis entropy by X}<Pf> where pi is a probability of event i and q is a real number residing in the interval 
(0, 1) from the preservation of concavity of the entropy. In addition, Kullback-Leibler (KL) relative entropy which 
measures distance between two probability distribution is also modified [jf). 

This paper explores consequences of adopting nonadditive information content in the sense that the associated av- 
erage information i.e entropy takes a form of the modified Tsallis entropy. The use of modified form of Tsallis entropy 
is in conformity with the appropriate definitions of expectation value (the normalized ^-expectation value Q) of the 
nonadditive information content. Since the information theoretical entropy is defined as the average of information 
content, it is desirable to unify the meaning of the average as the normalized q-expectation value throughout the 
nonadditive IT. Moreover we shall later see how the Shannon's additive IT is extended to the nonadditive one by 
addressing the source coding theorem which is the one of the most fundamental theorem in IT. 

The organization of this paper is as follows. In Sec. II, we present the mathematical preliminaries of the nonadditive 
entropy and the generalized KL entropy. Sec. Ill addresses an optimal code word within the framework of nonadditive 
context. We shall attempt to give a possible meaning of nonadditive index q in terms of codeword length there. Sec. IV 
deals with the extension of the Fano's inequality which gives upper bound on the conditional entropy with an error 
probability in a channel. In the last section, we devote to concluding remarks. 
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II. NONADDITIVE ENTROPY AND THE GENERALIZED KL ENTROPY 



A. Properties of nonadditive entropy of information 

For a discrete set of states with probability mass function p(x), where x belongs to alphabet Tt, we consider the 
following nonadditive information content I q (p), 

!q(p) = -ln q p(x), (1) 

where \n q x is a q— logarithm function defined as ln 9 x — (x l ~ q — 1)/(1 — q). In the limit q — > 1, hi q x recovers the 
standard logarithm In a;. In Shannon's additive IT, an information content is expressed as — hxp(x) in not unit, which 
is monotonically decreasing function with respect to p(x). This behavior matches our intuition in that we get more 
information in the case the least-probable event occurs and less information in the case the event with high probability 
occurs. It is worth noting that this property is qualitatively valid for nonadditive information content for all q except 
the fact that there exists upper limit 1/(1 — q) at p(x) = 0. Therefore the Shannon's reason (b) we referred in Sec. I is 
considered to be no crucial element for determining the logarithmic form. Moreover, it is easy to see that the Renyi 

information of order q = In ^^P 9 (x) / (1 — q), which is additive information measure can be written with this 

nonadditive information content as 

lnJ2 [l-(l-q)lMx))]^ 
H? = ^ _ . (2) 

The entropy H q (X) of a discrete random variable X is defined as an average of the information content, where the 
average means the normalized g-expectation value |J , 

HJX) = xeH ^ = ^= , (3) 

£y(aO (q-l)Y,P 9 (x) 

where we have used the normalization of probability YlxenP( x ) = 1. In a similar way, we define the nonadditive 
conditional information content and the joint one as follows 

hW x ) = ; » ( 4 

pi-«(x,y)-l 

iq(x,y) = — j , (5) 

where y belongs to a different alphabet y. Corresponding entropy conditioned by x and the joint entropy of X and 
Y becomes 



^(y\x)I q (y\x) l-Y,P q (y\x) 



and 



H {Y | x) = ^= = ^ (6) 

yey y ey 



E p q ( x 'V) I q( x >y) 1_ E P q ( x 'V) 
H{XY) = *_eH^y = ^y (7) 

xen.yey xen,yey 



respectively. Then we have the following theorem. 
Theorem 1. 

The joint entropy satisfies 
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H q (X,Y) = H q (X) + H q (Y\X) + (q- \)H q (X)H q {Y \ X). 



(8) 



Proof. 



From Eq(Q) we can rewrite H q (X, Y) with the relation p(x, y) — p{x)p{y \ x) as 



H q (X,Y) = 



- 1 



(9) 



Since Eq(|) gives ^ | i) = [1 + (g - l)H q (Y \ x)]' 1 , we get 



H q (X,Y) = 



q-1 



- 1 



(g-l)# g (Y \x) 



(10) 



Here, we introduce the following definition [|0| 

1 



"l + (q- l)H q (Y \x)' q l + {q- l)H q (Y \ X) 
where the bracket denotes the normalized q-expectation value with respect to p(x). Then we have from Eq(^o|) 



(11) 



H q (X,Y) 



\ + {q-l)H q {Y\X) 



(12) 



Putting ^p q (x) = [l + (q - l)H q (X)} 1 into this yields the theorem. □ 

This theorem has a remarkable similarity to the relation with which the Jackson basic number in q-deformation 
theory satisfies, which was pointed out in Ref. 0. That is, [X] q = (q x — l)/(q — 1) is the Jackson basic number of a 
quantity X. Then, for the sum of two quantities X and Y, the assosiated basic number [X + Y] q is shown to become 
\ x \q + [Y]q + (q - l)[X] g [Y] q . Obviously this theorem recovers ordinary relation H(X,Y) = HfX) + H(Y | X) in 
the limit q —> 1. In this modified Tsallis formalism, q appears as q-1 instead of as 1 - q (l|Q. When X and Y 
are independent events each other, Eq(||) gives the pseudoadditivity relation M. However, it is converse to the case 
of the original Tsallis one in that q > 1 yields superadditivity and q < 1 subadditivity. It is worth mentioning that 
the concept of nonextensive conditional entropy in the framework of the original Tsallis entropy has firstly introduced 
for discussing quantum entanglement in Ref. Jll| . From this theorem, we immediately have the following corollary 
concerning the equivocation. 
Corollary. 



H q (Y | X) = 



H q (Y, Z\X)- H q {Z \Y,X) + (q- l){H q {X)H q (Y, Z | X) - H q {X, Y)H q {Z | Y, X)} 



l + (q-\)H q (X) 



Proof. 

In Eq(S) , when we see Y as Y, Z we have 



H q (X, Y, Z) = H q {X) + H q (Y, Z\X) + (q- l)H q (X)H q (Y, Z | X), 
on the other hand, when we regard X as Y, X and Y as Z, we get 

H q (X, Y, Z) = H q (X, Y) + H q (Y, Z \ X) + (q - l)H q (X, Y)H q (Z \ Y, X) 



(13) 



(14) 



(15) 



Subtracting the both sides of the above two equations and arranging with respect to H q (Y \ X) with Eq(g), we obtain 
the corollary. □ 
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Remarks: In the additive limitfo -> 1), we recover the relation H(Y \ X) = H(Y, Z \ X) - H(Z Y,X). 

Moreover, with the help of Eq(|l|), we have the following theorem. 

Theorem 2. Hierarchical structure of entropy H q 

The joint entropy of n random variables X\, X 2l ■ ■ ■ , X n satisfies 

a 

H q (X 1 ,X 2 , ■ ■ ■ ,X n ) = [1 + (q - l)H q {Xi_ u ■ ■ ■ H q {X t \ X^, • ■ ■ , X 1 ). (16) 
1=1 

Proof. 

From Eq(|), H q (X 1 ,X 2 ) = H q (X x ) + [l + (q- l)H q (Xx)\ H q (X 2 \ X x ) holds. Next, from Eq®, we have 

H q (X 1 ,X 2 ,X 3 )=H q (X 1 )+H q (X 2 ,X 3 I X 1 ) + {q-l)H q {X 1 )H q {X 2 ,X 3 \ X x ). (17) 
Since iJ g (Jf 2 ,X 3 | Xi) is written using Eq(OI) as 



H q (X 2 ,X 3 I = H q (X 2 I Xr) + 1 t+( g -^x0 2) gg( ^ 3 1 * 2 ' Xl) ' (18) 
Eq(|l7|) can be rewritten as 

H q (X 1 ,X 2 ,X 3 )=H q (X 1 ) + [l + (q-l)H q (X 1 )}H q (X 2 | X 1 ) + [l + {q-l)H q (X 1 ,X 2 )]H q (X 3 \ X 2 ,X X ), (19) 
Similarly, repeating application of the corollary gives the theorem. □ 

Remark: In the additive limit, we get H{X\, X 2 , • • • , X n ) = Y^i=i H{Xi \ • ■ ■ , X{) which states that the entropy 

of n variables is constituted by the sum of the conditional entropies {Chain rule). 

From this relation Eq(]T6|), we need all joint entropy below the level of n random variables to acquire the joint entropy 
H q (Xi, . . . , X n ), which the situation is similar to the BBGKY hierarchy in the iV-body distribution function. Let us 
next define the mutual information I q (Y;X), which quantifies the amount of information that can be gained from 
one event X about another event Y , 



UP; X) S EJY) - H,(Y I ,Y) - + zML ^ "ww . (20) 

1 + (I - L) H q{X) 



Therefore Tq(Y; X) expresses the reduction in the uncertainty of Y due to the acquisition of knowledge of X. Here we 
postulate that the mutual information in nonadditive case is non-negative. The non-negativity may be considered to be 
a requirement rather than the one to be proved in order to be in consistent with the usual additive mutual information. 
l q (Y; X) also converges to the usual mutual information I(Y; X) = H(Y) - H(Y \ X) = H{X) + H(Y) - H(X, Y) 
in the additive case(g — * 1). We note that the mutual information of a random variable with itself is the entropy 
itself I q (X;X) — H q (X). When X and Y are independent variables, we have T q (Y;X) = Then, we have the 
following theorem. 

Theorem 3. Independence bound on entropy H q 

n 

H q (Xi,X 2 , ■ .. ,x n ) < [1 + (g- 1)^(^-1, • ■ -,Xi)} H q (Xi) (21) 

2 = 1 

with equality if and only if each X{ is independent. 
Proof. 

From the assumption of I q (X;Y) > introduced above, we have 

n n 

Y J H q {X i \X i ^...,X 1 )<Y J H q {X i ) (22) 

i=l i=l 

with equality if and only if each Xi is independent of . . . ,X\. Then the theorem holds from the previous 

theorem Eq(|l6|). □ 
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B. The generalized KL entropy 



The KL entropy or the relative entropy is a measure of the distance between two probability distributions pi (x) and 
p'i(x). Here, we define it as the normalized g-expectation value of the change of the nonadditive information content 
AI q =I q (p'(x))-I q (p(x))M, 



5>«(x)A/ 9 5> 9 (*) Qn q p(x) - \n q p'{x)) 

£p*o«o £?'(*) 



D q (p(x) || p\x)) = XE ^ - ; = ^ . (23) 



xEH xeH 

We note that the above generalized KL entropy satisfies the form invariant structure which has introduced in Ref. 
0,|5|. We review the positivity of the generalized KL entropy in case of q > which can be considered to be a 
necessary property to develop the IT. 
Theorem 4. Information inequality 
For q > 0, we have 

D q (p(x) || p'{x)) > (24) 

with equality if and only if p(x) = p'{x) for all x 6 7i. 
Proof. 

The outline of the proof is the same as the one in Ref. (lj],[l7j except for the factor ^2 xG -hP 9 ( x )- From the definition 
Eq(§), 

D q (p(x)\\p'(x)) = T ^- q J2p^)^{^) 1 'J/Ef'W (25) 

1 q { \tn P^J )' kn 

where Jensen's inequality for the convex function has been used : ^2 x p(x)f(x) > f QZ x p{x)x) with f(x) = ~ ln g (x), 
f"{x) > 0. We have equality in the second line if and only if p'(x)/p(x) — 1 for all x, accordingly D q (p(x) || p'(x)) = 0. 
□ 



III. SOURCE CODING THEOREM 



Having presented some properties of the nonadditive entropy and the generalized KL entropy as a preliminary, 
we are now in a status to address our main results that the Shannon's source coding theorem can be extended to 
the nonadditive case. Let us consider encoding the sequence of source letters generated from an information source 
to the sequence of binary codewords as an example. If a code is allocated for four source letters Xi, X2, X3, X4 as 
0, 10, 110, 111 respectively, the source sequence X2X4X3X2 is coded into 1011111010. On the other hand, if another 
code assigns them as 0, 1, 00, 11, then the codeword becomes 111001. The difference between the two codes is striking 
in the coding. In the former codeword, the first two letters 10 corresponds to X2 and not the beginning of any 
other codeword , then we can observe X2. Next there are no codeword corresponding to 1 and 11 but 111 is, and 
is decoded into X4. Then the next 110 is decoded into X3, leaving 10 which is decoded into X2- Therefore we can 
uniquely decode the codeword. In the latter case, however, we have possibilities to interpret the first three letters 111 
as X 2 X 2 X 2 , X 2 X 4 or as XjX 2 - Namely, this code cannot be uniquely decoded into the source letter that gave rise 
to it. Accordingly, we need to deal with so-called the prefix code or the instantaneous code such as the former case. 
The prefix code is a code which no codeword is a prefix of any other codeword(prefix condition code) [p]Jl^]. We recall 
that any code which satisfies the prefix condition over the alphabet size D (D = 2 is a binary case) must satisfy the 
Kraft inequality US 

M 

^D- h <l (27) 
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where k is a code length of ith codeword(i = 1, . . . M). Moreover if a code is uniquely decodable, the Kraft inequality 
holds for it jl|]l^]. We usually hope to encode the sequence of source letters to the sequence of codewords as short 
as possible, that is our problem is finding a prefix condition code with the minimum average length of a set of 
codewords}^}. The optimal code is given by minimizing the following functional constrained by the Kraft inequality, 



where pi is the probability of realization of the word length and A is a Lagrange multiplier. We have assumed 
equality in the Kraft inequality and have neglected the integer constraint on Z,. Differentiating with respect to U and 
setting the derivative to yields 

P 9 

D ~ i = C^JAtolF (29) 

Here, it is worth noting that from the Kraft inequality the Lagrange multiplier A is related as A > (logD) -1 . 
Furthermore when the equality holds, the fraction D~ li which is given by the optimal codeword length I* is expressed 
as 

D- 1 ' = (30) 

Therefore I* can be written as ^og D (^2 i p q ) — q\og D pt and in the additive limit, we obtain I* = — log D pi. However, we 
can not always determine the optimal codeword length like this since the Zj's must be integers. We have the following 
theorem. 
Theorem 5. 

The average codeword length {L) q of any prefix code for a random variable X satisfies 

(L) q > H q {X) (31) 

i 

with equality if and only if = [1 — (1 — q)U\ 1 ~ q 
Proof. 

From Eq(|23|) the generalized KL entropy between two distributions p and r is written as 



1-EiP? EiPKrt 11 -!) { ,, 2] 



(i-?)E<p? (i-?)E<p? ' 

If we take the information content associated with probability r as the i-th codeword length Zj, i.e, — {r\~ q — 1)/(1 — q) = 
k, then the average codeword length can be written from Eq(|32|) as 

(L) q =H q {X) + D q (p\\r) (33) 

Since D q (p j r) > for q > from Theorem 4, we have the theorem. The equality holds if and only if pi — ri. □ 
We note that the relation — {r\~ q — 1)/(1 — q) = h means that the codeword length equals to the information 
content different from p, Zj = I q (r). When the equality is realized in the above theorem, we can derive an interesting 
interpretation on the nonadditivity parameter q. The condition of the equality states that the probability is expressed 
as the Tsallis's canonical ensemble like factor in an i-wise manner. Then each U has the limit in length corresponding 
to q such as l™ ax = 1/(1 - q). 

Since log D (E»P?) — Q^°&DPi obtained by the optimization problem is not always equal to an integer, we impose 
the integer condition on the codewords by rounding it up as l { = [log D {J^iPi) — Q^°SdPAi where \x] denotes 
the smallest integer >i 0. Moreover the relation [~log D (J2iPi) ~ Q^EDPi] — ^°Sd (J2iPi) ~ Q^°SDPi leads to 

-1 logo Pi) 



E^T = 1 - ( 34 ) 



G 



Hence {k} satisfies the Kraft inequality. Moreover, we have the following theorem. 
Theorem 6. 

The average codeword length assigned by k = \\og D EjPf) ~~ Q^°EdP^ satisfies 



H q (p) + D q (p || r) < (L) q < H q (p) + D q (p || r) + 1. (35) 



Proof. 

The integer codeword lengths satisfies 



, _ _ ( 36 ) 

\ i / \ i / 

Multiplying by pI/^ZaPI and summing over i with Eq(|3^) yields the theorem. □ 
This means that the distribution different from the optimal one provokes a correction of D q {p \\ r) in the average 
codeword length as does in the case of additive one. 

We have discussed the properties of the nonadditive entropy in the case of one letter so far. Next, let us consider the 
situation which we transmit a sequence of n letters from the source in such a way that each letter is to be generated 
independently as identically distributed random variables according to p(x). Then the average codeword length per 
letter (L n ) q = {l(Xi, . . . , X n )) q /n is bounded as we derived in the preceding theorem, 

H q {X u ...,X n ) < (l(X u ...,X n )) q <H q (X 1 ,...,X n ) + l. (37) 

Since wc arc now considering independently, identically distributed random variables X\, . . . ,X n , we obtain 

£?=i [1 + (g - ■ ■ ■ H q {X t ) ^ i j \ „ £?=i [1 + (g - l)H q (X-i, ■ ■ ■ ,X 1 )} H q (X t ) 1 

n n n 

where we have used Theorem 3. This relation can be considered to be the generalized source coding theorem for 
the finite number of letters. We note that we obtain H(X) < (L n ) < H(X) + 1/n in the additive limit since 
H(X X , ...,X n )=J2 i H{X t ) = nH{X) holds. 

IV. THE GENERALIZED FANO'S INEQUALITY 

Fano's inequality is an essential ingredient to prove the converse to the channel coding theorem which states that 
the probability of error that arises over a channel is bounded away from zero when the transmission rate exceeds the 
channel capacity. p9[ . In the estimation of an original message generated from the information source, the original 
variable X may be estimated as X' on the side of a destination. Therefore, we introduce the probability of error 
P e = Pr{X' X} due to the noise of the channel through which the signal is transmitted. With an error random 
variable E defined as 

W lifX '^ X (39) 
^ ~ \ if X' = X [M) 

we have the following theorem which is considered to be the generalized(nonadditive version) Fano's inequality. 
Theorem 7. The generalized Fano's inequality 



H(X\YXH(P)+ l + (g-lW^n pq e l-d^l-l) 1 -" , , 

H q (X | Y) < H q (P e ) + i + {q _ y) - q + pe)q {q _ - i}1 _ v (40) 



where \7i\ denotes the size of the alphabet of the information source. 
Proof. 

The proof can be done along the line of the usual Shannon's additive case(e.g. |Q|). Using the corollary Eq([l^), we 
have two different expressions for H q (E, X \ Y) , 



l + (q-l)H q (X,Y) 



H q (E,X | Y) = H q (X | Y) + ; ,^ \^ ( W H q {E \ X,Y) (41) 

and 
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H q (E,X | Y) = H q (E | Y) + 1 t+^-l)i^Y) Hq{X 1 ^ Y) (42) 

where we have used the corollary by regarding H q (E,X | V) as H q (X,E | V) in the second expression Eq(^|). Wc 
are now considering the following situation. That is, we wish to know the genuine random variable Y which can be 
related to the X by the nonadditive conditional information content I q {y \ x). Hence we calculate X' , an estimate of 
A, as a function of Y such as g(Y) (jlj. Then we see H q (E \ X, Y) becomes since E is a function of X and Y by the 
definition Eq(^). Therefore the first expression of H q (E,X | Y) reduces to H q (X | Y). On the other hand, from the 
non-negativity property of T(E;Y) we assumed and from the relation H q {E) = H q (P e ), we can evaluate H q (E \ Y) 
as H q {E | Y) < H q {E) = H q (P e ). Moreover, H q (X \ E, Y) can be written as 



H q (X\E,Y) 



Y, E {Pr{E}yH q (X\Y,E) 

(1 - P e YH q {X | y, 0) + P*H q (X | Y, 1) 

Pi + (1 - p e )1 



(43) 



For E = 0, g(Y) gives X resulting in H q (X Y, 0) — and for E = 1, we have upper bound on H q (X \ Y, 1) by the 
maximum entropy comprising of the remaining outcomes \H\ — 1, 

i - (\n\ - i) l - q 

HJX Y,l)< K ,, 1 , ; N1 . (44) 

9V ' ' ' ~ (l-q)(\H\ - I) 1 "" V ' 

Then it follows from Eq(||) that 

1 - (|ft| - I) 1 "* 

(1 1)1-9* 



[p| + (i - p e )«] | y) < p ^, Anl, ;,v (45) 



Combining the above results with Eq(^), we have the theorem. □ 

Remark: In the additive limit, we obtain the usual Fano's inequality H(X \ Y) < H(P e ) + P e \n(\H\ — 1) in nat unit. 

V. CONCLUDING REMARKS 

We have attempted to extend the Shannon's additive IT to the nonadditive case by using the nonadditive information 
content Eq(Q). In developing the nonadditive IT, this postulate of the nonadditive information content seems to 
plausible selection in terms of the unification of the meaning of average throughout the entire theory. As a consequence, 
the information entropy becomes the modified type of Tsallis nonextensive entropy. We have shown that the properties 
of the nonadditive information entropy, conditional entropy and the joint entropy in the form of theorem, which are 
necessary elements to develop IT. These results recover the usual Shannon's ones in the additive limit(g — > 1). 
Moreover, the source coding theorem can be generalized to the nonadditive case. As we have seen in the theorem 5, 
the nonadditivity of the information content can be regarded that it determines the longest codeword we can transmit 
to the channel. Philosophy of the present attempt can be positioned as a reverse of Jaynes's pioneering work |po| , pT| . 
Jaynes has brought a concept of IT to statistical mechanics in the form of maximizing entropy of a system (Jaynes 
maximum entropy principle). The information theoretical approach to statistical mechanics is now considered to be 
very robust in discussing some areas of physics. In turn, we have approached IT in a nonadditive way. We believe that 
the present consideration based on the nonadditive information content may trigger some practical future applications 
in such an area of the information processing. 
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